Drug related deaths in the United States of America

A case study

Introduction

The last couple of years, we have often read about drug overdoses being a big problem in the US. And of course we have all seen Netflix`s Narcos. This made us want to dig deeper into the understanding of the drug problem in the US. Robert R. Redfield, the leader of Centers for Disease Control and Prevention in the US recently made an statement and said: “This statistics is a clear warning that we are loosing to many americans to early, to often, by reasons we can prevent.” He is refering to the recent studys of the statistics involving overdoses and suicides in the US. So in summary we just want to have a look at how bad things are in the land of dreams.

More specific, we want to see which states have the most or least OD’s, scaled for population per state. We are also taking a look at the types of drugs involved in OD’s, and trying to see if there is correlation between OD’s, income, unemployment, weather and population. It would also be interesting to see if OD’s occur more or less during the different months in a year. At the end we will use our data to create a regression model that can explain OD’s, but don’t get your hopes up for that one. As we read more and more and ran tests, we quickly found out that it is not easy to explain why people end up overdosing on drugs, since life is complicated, and humans are not rational. We will try to illustrate our findings in the most convenient way. We have been motivated to write this project by multiple couses as mentioned. Here is an example of a relavant article: https://www.nrk.no/urix/forventet-levealder-i-usa-faller-1.14324124. We hope you enjoy our work. The visual presentation of the project has been of highest priority. All the code is found on our github: https://github.com/omyrland/Data_Science_exam.git

The screencast for our project can be viewed at the following link: https://www.dropbox.com/s/jw3awjzfl12588n/screenkast.mp4?dl=0

The data

The data in the project is a result of scraping, gathering, manipulating and arranging multiple data sets with tens of thousand observations. All the data can be found in the .rmd document on our github. Feel free to download the file, take a look at the data and do your own calculations. Remember to download the css file as well, to be able to compile it as html in rmarkdown.

This project contains multiple sources of data and datasets:

-VSRR Provisional Drug Overdose Death Counts

The main source of data is contained in this dataset. It contains provisional counts for drug overdose deaths based on a current flow of mortality data in the National Vital Statistics System. National provisional counts include deaths occurring within the 50 states and the District of Columbia as of the date specified and may not include all deaths that occurred during a given time period. Provisional counts are often incomplete and causes of death may be pending investigation resulting in an underestimate relative to final counts.

Other smaller datasets:

-State population 2015 - Infoplease

-State population 2016 - Dilemma X

-State population 2017 - Wikipedia

-State population 2018 - World Population Review

-Local Area Unemployment Statistics

-Average Annual Sunshine by State

-Average Annual Temperature for Each US State

-Average Annual Precipitation by State

-Median Household income

-List of Latitudes and Longitudes for every State

Summary of study

In this study we have examined the deaths by drugs for the United States Of America. Usually drug data, statistics and presentation focuses on grouping by age, but not in this study. We wanted to visualise the geographical variation, and explore the data for each individual state in the US. The data range from 2015-2018 with 2015 and 2016 as the most reliable source. The two last years contain incomplete data due to juristic factors. Nevertheless, the discoveries we made were very interesting. By scaling the overdoses by the respective population data for each state, we could see how many people died by drugs scaled equally. District of Columbia, West Virginia and Ohio were the states with highest overdose rate in the total of US in 2017. The overdose rate in DC and Ohio has almost doubled from 2015 to 2017. Take a look at the bar plots divided into regions to see the overall development per state from 2015-2017. Ioa, South Dakota, and Nebraska is the states with lowest registered deaths by drugs.

We have compared the overdose rate by the states with highest and lowest income, temperatures and unemployment. We found some correlation but not as strong as expected. This implicates that the reasons for overdosing on drugs is explained by many factors and is a complicated issue. This assumption was backed by our linear regression model that only achieved an R2 value of 14.5 percent, meaning there is a lot of error explaining the outcome of number of death by drugs. The key numbers for each state is visualised in the interactive map below. The most used drugs is opioids, syntetic opioids and heroin. 2.45 % of all deaths in average in the US was drug related in 2017. The total average drug deaths per state for 2015 is 11 585 and worse, it increased to 15 856 in 2017. This is an increase of 36.9% from 2015 to 2017, implying that it is a big problem for all the states in the US.

Overall this has been an fun an educational experience we are glad to have been through.

Interactive map with summary of every state

Click the popup icon to read about the statistics for each state.

Let’s take a look at the total number of deaths by drugs scaled for 10 000 inhabitants per state

We scale the data so the relationship between the number of deaths become equal to the number of inhabitants in every state. Then we divide all of the states into four regions:

Northeast

Midwest

South

West

Deaths by drugs for all states compared to one another in 2016:

How did they rank in 2017?

State Deaths per 10 000 inhabitant Rank
District of Columbia 68 1
West Virginia 64 2
Ohio 52 3
Pennsylvania 50 4
Maryland 46 5
Kentucky 43 6
Delaware 42 7
New Hampshire 41 8
Massachusetts 38 9
Rhode Island 38 10
Connecticut 35 11
Maine 34 12
Florida 34 13
Tennessee 32 14
Indiana 32 15
New Jersey 32 16
Michigan 32 17
Nevada 29 18
New Mexico 29 19
Missouri 28 20
North Carolina 27 21
Louisiana 27 22
Vermont 26 23
Oklahoma 26 24
Arizona 25 25
South Carolina 24 26
Utah 24 27
Illinois 24 28
Wisconsin 24 29
Colorado 22 30
Alaska 21 31
Virginia 21 32
Alabama 20 33
Washington 18 34
Georgia 18 35
Idaho 16 36
Hawaii 16 37
Arkansas 16 38
Minnesota 15 39
Wyoming 15 40
Oregon 15 41
California 15 42
New York 15 43
Montana 14 44
North Dakota 13 45
Mississippi 13 46
Kansas 13 47
Texas 13 48
Iowa 13 49
South Dakota 10 50
Nebraska 8 51

You probably don’t wanna take your kids on vacation to DC, West Virginia or Ohio

This is some serious numbers. The total average of drug deaths per state for 2015 is 11 585 and worse, it has increased to 15 856 in 2017. This is an increase of 36.9% from 2015 to 2017, implying that it is a serious problem in the US. The total number of deaths by drugs was 590 825 in 2015, 682 084 in 2016 and 808 661 in 2017. This equals a total of 2 081 570 people just for the three years this case study is studying. That’s the same as the total population of Slovenia to put things in perspective. Wiped out over three years.

How many drug deaths do we find compared to all deaths?

1.9 % of all deaths in average in the US was drug related in 2015

2.2 % of all deaths in average in the US was drug related in 2016

2.45 % of all deaths in average in the US was drug related in 2017

Will low income and high unemployment result in a higher overdose rate?

2016 numbers

Our assumption is that low income and high unemployment equals to high overdose rate. So we take a look at the top 10 states of high overdose rate, lowest income and highest unemployment.

Scatter plot for median household income

State Deaths pr 10000 inhabitants Rank
West Virginia 53 1
New Hampshire 39 2
Ohio 39 3
District of Columbia 38 4
Rhode Island 38 5
Kentucky 36 6
Pennsylvania 36 7
Massachusetts 35 8
Maryland 34 9
Connecticut 30 10
State Median household income 2016 Rank
Mississippi 40528 51
Arkansas 42336 50
West Virginia 42644 49
Alabama 44758 48
Kentucky 44811 47
Louisiana 45652 46
New Mexico 45674 45
Tennessee 46574 44
South Carolina 46898 43
Oklahoma 48038 42

State The unemployment rate in percent of states labor force 2016 Rank
Alaska 6.9 51
New Mexico 6.7 50
District of Columbia 6.1 48
West Virginia 6.1 48
Louisiana 6.0 47
Alabama 5.9 46
Illinois 5.8 44
Mississippi 5.8 44
Nevada 5.7 43
California 5.5 42

As we can see this is not always the case. West Virginia stands out and is represented badly in all three categories. The people working in DC make good money, but both unemployment and od-rates are high. Kentucky is represented in the table for low income. Maryland is the states with highest income over the last couple of years by a solid margin, and is the 9th worst place for overdoses in the country. Otherwise the similarities was not as strong as expected.

The correlation between overdoses and unemployment was 24.97% in 2016.

The correlation between overdoses and income was 1.15% in 2016.

But high income and low unemployment would equal low od-rate right?

State Deaths pr 10000 inhabitants Rank
Nebraska 7.5 1
South Dakota 9.3 2
North Dakota 10.9 3
Texas 11.7 4
Iowa 11.9 5
New York 12.6 6
Kansas 13.0 7
Mississippi 13.2 8
Minnesota 13.9 9
Montana 14.2 10
State Median household income 2016 Rank
Maryland 76067 1
Alaska 74444 2
New Jersey 73702 3
District of Columbia 72935 4
Hawaii 71977 5
Connecticut 71755 6
Massachusetts 70954 7
New Hampshire 68485 8
Virginia 66149 9
California 63783 10
State The unemployment rate in percent of states labor force 2016 Rank
New Hampshire 2.9 1
Hawaii 2.9 1
South Dakota 3.0 3
North Dakota 3.1 4
Nebraska 3.1 4
Vermont 3.2 6
Colorado 3.3 7
Utah 3.4 8
Iowa 3.6 9
Maine 3.8 10

Not necessarily.

What about the states with high and low temperatures?

State TempC Rank
Florida 22 1
Hawaii 21 2
Louisiana 19 3
Texas 18 4
Georgia 18 5
Mississippi 17 6
Alabama 17 7
South Carolina 17 8
Arkansas 16 9
Arizona 16 10
State TempC Rank
Alaska -3.0 1
North Dakota 4.7 2
Maine 5.0 3
Minnesota 5.1 4
Wyoming 5.6 5
Montana 5.9 6
Vermont 6.1 7
Wisconsin 6.2 8
New Hampshire 6.6 9
Michigan 6.9 10

New Hampshire is the only state that is represented in the top ten list for low temperatures as well as the list for most overdoses. Texas and Missisipi is similarly the only states that are represented in the top ten list for high temperatures and the list for the least overdoses. It makes sense that the correlation between weather and overdoses are not strongly correlated.

The correlation between overdoses and temperature was 31.98% in 2016.

The correlation between overdoses and population was 86.8% in 2016.

Average amount of drug overdoses per month 2015-2018 per state

Our observation is that there is presumably a linear increase in the average amount of deaths related to overdoses from month to month, and year to year from 2015 to 2018. The downcrease in 2018 is probably due to incomplete data.

Average number of deaths by drugs vs average number of total deaths 2015-2018

From our observations of the graphs and the correlation test (cor = 0.95) we can see a strong linear relationship between the mean for overdoses and the mean for deaths. The p-value is equal to 0.05 making our observation likely to be significant.

Remember that number of incidents do not equal to number of deaths, since people could be affected by several drugs during time of death, and all of them would be registerd.

What will a linear regression model say about overdoses as dependent variable with income, weather and unemployment as independent variables?

## 
## Call:
## lm(formula = OD16_relation ~ Median_income16 + PrecipitationMM + 
##     Clear_days + Rate2016, data = comparedata)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.001323 -0.000648 -0.000169  0.000505  0.002511 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)  
## (Intercept)      3.35e-04   1.45e-03    0.23    0.818  
## Median_income16  1.02e-08   1.48e-08    0.69    0.496  
## PrecipitationMM  4.60e-07   3.84e-07    1.20    0.238  
## Clear_days      -3.18e-06   5.20e-06   -0.61    0.544  
## Rate2016         2.68e-04   1.34e-04    2.01    0.051 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9e-04 on 45 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.145,  Adjusted R-squared:  0.0687 
## F-statistic:  1.9 on 4 and 45 DF,  p-value: 0.126

R2 measures goodnes of fit for our model. As we can see only 14,5% of the variance of the dependent variable is explained by the varience of the independent variables. This implying that we have a lot of error in the model, explaining why our independent variables only explains 14,5 % of the variance in overdose ratio. This means there is no direct and distinctive correlation between the overdose ratio & income, weather and unemployment. If we try to run the model with less independent variables the variance is explained even less.

Heatmap of drug ratio US 2016

The graphic displayed above is a heat map that represents the data as colors by their value. The “warmer” the colors are, the higher the value of the data is on that area of the map. Our observation is that the east coast of the United States has the most death counts related to overdoses. This is most likely because there is more states and people living in this part of the country, compared to the rest of the US.

Final words

We hope the US get the opportunity to reverse this tragic trend in the future. The data is intimidating, and as stated in the introduction, too many americans die too early, and it is truly a big public problem. Thank you for reading.